Inflationary dynamics for matrix eigenvalue problems 
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Many fields of science and engineering require finding eigenvalues and eigenvectors of large matri- 
ces. The solutions can represent oscillatory modes of a bridge, a violin, the disposition of electrons 
around an atom or molecule, the acoustic modes of a concert hall, or hundreds of other physical 
quantities. Often only the few eigenpairs with the lowest or highest frequency (extremal solutions) 
are needed. Methods that have been developed over the past 60 years to solve such problems include 
the Lanczos [Il[2] algorithm, Jacobi-Davidson techniques [3], and the conjugate gradient method [?]. 
Here we present a way to solve the extremal eigenvalue/eigenvector problem, turning it into a non- 
linear classical mechanical system with a modified Lagrangian constraint. The constraint induces 
exponential infiationary growth of the desired extremal solutions. 



Physical problems of importance to many fields of sci- 
ence are routinely reduced to an eigenvalue problem for 
a real symmetric, or hermitian, N x N matrix A: 

Alpn = Gnlpn (1) 

where ipn is the n*'* eigenvector with eigenvalue e„. Re- 
alistic simulations can generate matrices of dimension 
TV = 10^ or more, but often most of the matrix elements 
vanish for physical reasons, yielding a sparse matrix. 
Well established methods exist, which focus on finding 
extremal eigenpairs (or internal eigenpairs with various 
pre-conditioning strategies), such as conjugate gradient 
methods Jacobi-Davidson techniques '3^ , and Lanczos 
algorithms [1, 2, 5j. 

The bulk of the numerical effort in these approaches 
goes into the iteration step, e.g., multiplying the matrix 
A by a vector (j). The methods differ as to how infor- 
mation from each iteration is used. Lanczos and Arnoldi 
methods use a Krylov space spanned by the initial (often 
random) vector 0i and its iterates (/)„ — Acji^-i- 

Within the class of Krylov space approaches, diago- 
nalizing in the full Krylov space is variationally optimal 
and in principle cannot be bested by any other use of the 
Krylov vectors, such as the one we propose here. How- 
ever, if this were the end of the story there would be 
little need for restarting algorithms (such as implicitly 
re-started Arnoldi, used in the ARPACK library and in 
the MATLAB environment for example), or for the many 
other strategies that have been proposed. One reason for 
these strategies, and the continued activity in the field, is 
that in practice memory issues (in storing the matrix A 
and/or the Krylov vectors) and numerical stability issues 
both limit the performance of the ideal Lanczos method. 

Furthermore, the subject of optimal approaches to 
large matrix eigenvalue problems remains active due to 
special requirements associated with different problems 
(such as the need for interior eigenpairs, the number 



of eigenpairs needed, the accuracy required, etc.), and 
the existence of classes of matrices with special con- 
vergence characteristics (diagonal dominance, large or 
small spread of diagonal elements, near degeneracy of 
lowest eigenvalues, etc.). Non-Krylov space approaches, 
especially the Jacobi-Davidson method, have been in- 
vented to address some of these issues. The David- 
son method, which uses pre-conditioning, has been in- 
valuable for quantum chemistry applications, especially 
where several lowest eigenpairs are needed, the spread 
of diagonal elements is large, and the matrices, though 
sparse, still contain too many nonzero elements to store. 

Our purpose in this paper is to report an approach 
that starts from a fresh premise. Although it also relies 
on matrix-vector multiplication and is not immune to 
the issues that limit the standard approaches, it is differ- 
ent enough in its design and implementation to deserve 
special attention. The main idea is to replace a large 
real symmetric (or hermitian) A^-dimensional eigenvalue 
problem with an A'^-dimensional classical trajectory prob- 
lem, where the potential energy minimum corresponds to 
the minimum eigenvalue eo and the coordinates of this 
minimum correspond to the eigenvector ip^. Even with 
a billion dimensions, for the proper choice of parameters 
one quickly finds the minimum under "time" evolution 
(discrete iterations). There are no false local minima. 
We show that under this dynamics, the lowest eigenvec- 
tors grow exponentially fast relative to their neighbors, 
through a Lagrange multiplier that regulates this infla- 
tion. Nearby eigenpairs are also easily found. Implemen- 
tation of the algorithm is simple, flexible, and robust, 
and convergence rates are easily proved analytically. 

Associating various kinds of eigenvalue problems with 
dynamical systems is not new; it is especially popular in 
the context of quantum mechanics. Examples include the 
Pechukas approach to random matrix eigenvalue spec- 
tra [6 , the Miller-Meyer-McCurdy classical analog for 
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electronic degrees of freedom [7] , the powerful and popu- 
lar idea of replacing quantum statistical mechanics with 
a classical polymer bead system [8], and the well known 
association of classical driven, damped oscillators with 
quantum transitions in spectroscopy [S]. We single out 
the Car-Parrinello (CP) method and especially Car's 
damped CP method, proposed in the context of density 
functional theory (DFT), as most closely related to the 
present work |10j . However we treat here the general real 
symmetric (or hermitian) eigenvalue problem; there is no 
connection to DFT, the Kohn-Sham equations, or even 
quantum systems. Nonetheless, the potential is there in 
the future to try to construct on the fly CP-like methods 
that are not DFT based. 

The real symmetric eigenvalue problem is equivalent to 
finding the principal axes or "normal modes" of the har- 
monic potential defined by A(x) = J2i j ^i^ij^j^ where 
the thought of as real orthogonal coordinates (we 

will not explicitly write down expressions for the her- 
mitian case, but the extension is trivial). We adopt a 
Lagrangian approach initially, but several modifications 
to follow may take us away from a strict Lagrangian dy- 
namics. We take the Lagrangian to be 

^ = E^?-E^»^M2^j + ^ ' (2) 

i \ i / 

where the Lagrange multiplier A enforces normaliza- 
tion. Lagrange's equations require X{t) — A{x{t)) = 
J- Xi(i)^i (t), i.e., the potential energy at time t. 
The Euler-Lagrange equations read 

Xt = - ^ AijXj + X{t)xi; l<i< N . (3) 
i 

Defining pi — Xi, and applying a naive Euler integrator 
with time step 5t gives the discrete-time mapping 

p,{t + 5t) = p,{t) - [A^] - K-t)6^^ Xj{t) 5t 

J 

x,{t + dt) = Xi{t) + p^{t + St) 6t . (4) 

(More sophisticated discretizations, such as Verlet, also 
work well.) It is revealing to make a linear transformation 
to the (as yet unknown) normal (eigenvector) momenta 
and coordinates tTj , : 

TT^{t + St) = Tr,{t) - [e, ~ Xit)]^,{t) 6t 

Ut + St) = C^{t) + 7T,{t + St) St . (5) 

Iterating Eq. [4] is mathematically equivalent to iterat- 
ing Eq. [5j which is a discrete area-preserving map cor- 
responding to a set of N independent damped harmonic 
oscillators. 

For sufficiently small time step (not necessarily the 
regime we want to be in numerically) and assuming tem- 
porarily that X(t) is constant, we have 

£,i{t) = ai cos{uj^t + S^) (6) 



if ei > A, where uji = \J ei — X and Si is a real phase 
shift (which along with depends on initial conditions 
^i(O) and 7ri(0)). On the other hand, for low eigenvalues 
e-i < A, the normal coordinates evolve as 

m = (a^e-^* + 6^e-^*) , (7) 

where lo\ = ^A — e^, and the first term obviously dom- 
inates at long times. Of course A(t) is not constant in 
practice, but it does quickly become slowly decreasing. 
Thus we see that eigenmodes with eigenvalues below 
X(€) (which set always includes the ground state, by the 
variational theorem) are exponentially infiating at any 
given time i, while the higher modes become simple os- 
cillators. We will say more about the optimal choice of 
St below. 

The normalization x\ ^ \ can no longer be en- 
forced by the Lagrange multiplier X(t) when finite, and 
possibly large, time steps St are taken. That is not a 
problem, because the vector norm is easily imposed nu- 
merically before each time step, or even at the very end 
of the calculation (since we may easily generalize the 
previous expression for A to A = X^i j ^i^i.i^i/ Si ^^fi 
making Eqs. |4] and [s] equally valid for x of any norm). 
However X(t) retains the more important job of regulat- 
ing inflation, by controlling the border between inflating 
and non-inflating states. This is a key point. From this 
point of view, there is no reason for X(t) in Eq. |4] or [5] 
to be strictly defined as the current potential energy es- 
timate at time t. Instead, we may replace it with A(i), a 
time-dependent parameter under our control to help op- 
timize convergence by inflating the desired eigenmodes as 
rapidly as possible relative to the other modes. One may 
show, through a simple rescaling of variables, that this re- 
placement is mathematically equivalent to adding a time- 
dependent damping term to Eq. |3] Xi = ■ ■ ■ — 'y{t)xi{t), 
where A = A + j'^/i. 

To calculate the rate of convergence of the inflation 
method, it is sufficient to consider inflation of the ground 
state coordinate relative to the first excited state coor- 
dinate ^1, since ground state infiation relative to higher 
excited states is obviously at least as fast. From Eqs. [6] 
andj?] we have 6(i)/^i(i) e^<V^=^-V^=^)t ^ The 
exponent is peaked, and thus the ground state is ap- 
proached fastest, when the infiation border A is precisely 
equal to ei, the eigenvalue of the first excited state. In 
that case the ground state coordinate grows relative to 
every other at the fastest possible rate, 

Ut)/Ut)-e^/^>K (8) 

In practice we do not know the energy ei a priori, but 
given an estimate of the gap eoi = ei — eo, which can be 
obtained in several ways, we can set X{t) = X{t) + eoi, 
ensuring that as X{t) approaches the ground state en- 
ergy Co we get maximal possible infiation of the ground 
state. When working with a class of similar physical sys- 
tems, the simplest approach is to use the typical value 
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of the gap as our initial estimate for eoi, and set the 
initial value of A accordingly. One can do much bet- 
ter in a particular case by performing some number of 
iterations using an initial guess for egi (to cleanse the 
estimate of high lying eigenpairs), and then varying this 
guess, while noting which value gives the steepest descent 
of /i = Tr [{A — A(t))^] , which is a standard measure of 
error that requires no additional matrix-vector multipli- 
cations. The optimal estimate of the gap will result in 
the error decaying as fj, e"^"^"^*, providing an obvious 
consistency check on our estimate. 

The convergence of the algorithm is limited by the time 
step St, which we would like to take as large as possible 
to reach large times quickly. Strict accuracy is not a con- 
cern, since we are only interested in inflating the lowest 
few modes relative to the others, not in faithful integra- 
tion of the second order differential equations. However, 
too large a time step will destabilize the stable oscilla- 
tors corresponding to very high eigenvalues (very large 
e,- A in Eq.|5|, causing inflation of the wrong modes. A 
short analysis shows that Eq. [5] remains stable for large 
ei only if the time step St obeys St < 2/a;i„ax, where 
"^max — Gmax " is an uppcr bound on the possible 
values of — A. Thus we have an approximate bound 
St < which we have found works rather well in 

practice. Combining this result with the optimal rate of 
convergence in continuous time (Eq. [8| , we find that the 
number of discrete steps required for convergence scales 
as 

Of course to choose an appropriate time step St in a 
particular calculation, we need a rough estimate of the 
spectral range emax — cq. In this regard, it is interesting 
to note that starting off with too large a time step does 
not ruin the calculation, as is revealed by continuing with 
the iterations at a smaller time step: the error /i(t) of- 
ten quickly recovers, dropping dramatically in just a few 
steps. This is easily understood from our earlier analy- 
sis: Too large a step may inflate some of the highest lying 
eigenpairs, with eigenvalues ~ Cmax, ruining the mea- 
sure fi. However, these eigenpairs are precisely the ones 
killed most rapidly as soon as the time step is reduced 
to the stable region. That is, inflation has already been 
working well over most of the spectrum, but this was not 
reflected in the error measure; the errant eigenpairs are 
easily eliminated once the time step is reduced. 

The square root in Eq. [9] is a result of using a sec- 
ond order differential equation in Eq. [3] It is instruc- 
tive to consider the more straightforward idea of apply- 
ing exp{—(3A) to a trial vector (/)(0), i.e., solving the flrst 
order equation d(j){(3)/d(3 — — A0(/3). In a speciflc ba- 
sis, this gives dxi{l3)/d[3 = —J2j^i,j^ji0)' which can 
be discretized in (imaginary) time (3. This very sim- 
ple idea works of course, but only for much smaller time 



steps and with slower convergence. To prevent runaway 
growth of the high-lying eigenmodes, the time step SP 
must be chosen so that SP ~ l/(eniax — ^o), as com- 
pared with St ~ 1 / Vemax — Co in the Lagrangian case, 
and the number of steps needed for convergence scales 
as P/SP ~ (cniax ~ eo)/(ei ~ cq), i.e., the square of the 
number of time steps needed in the Lagrangian method 
(Eq. [9]). The same speedup was found for the second or- 
der damped Car-Parrinello method (TU] , and also applies 
to the conjugate gradient method. 

It is tempting to try differential equations of even 
higher order m > 3, e.g., d™-Xi/dt"^ = — ^ + ' ' ' i 

J 

but this does not work, as it is impossible for all m roots 
^ (e, + • • • Y/™- to be in the same half-plane, for cither 
sign of {ci + ■ ■ ■), and thus inflation will always occur 
both for large and small e;. 

The above analysis uncovers a problem, common to 
iterative eigenpair methods: slow convergence to the 
ground state if one or more excited state energies are 
very close to the ground state energy. A very satis- 
factory solution exists for this problem in the present 
context. Instead of waiting for inflation to separate out 
the ground state from these low-lying excited states, we 
admit the low- lying excited states into our calculation 
by choosing a window size w and performing inflation 
with \{t) — \{t) + w. This choice optimally inflates 
away all modes with energy > eg -I- w, requiring only 
~ (emax ~ c-o) / w stcps for Convergence, and the result- 
ing vector (j) contains (to any desired accuracy) only con- 
tributions from the k states within the energy window 
[eo, eo-l-w]. Subsequently, (j) is iterated an additional fc — 1 
times, saving vectors (pn and A(j)n after each iteration, 
and finally the hamiltonian A is constructed and diago- 
nalized explicitly in the subspace spanned by 0o . . . 4>k-i- 
The parameter k may be incremented until convergence 
to the lowest eigenpair is achieved. We note that the di- 
agonalization is numerically trivial for moderate k, so the 
only significant additional cost is that associated with the 
storage of the 2k vectors and A(/)„. A tradeoff between 
time and storage constraints determines the optimal win- 
dow size ly, as a larger w requires fewer iteration steps, 
but makes it necessary for a greater number of vectors 
to be simultaneously stored in memory before the final 
diagonalization. We have successfully used this approach 
on various matrices (see below for details). 

Several obvious generalizations can be implemented. 
The governing equations of the inflation method may eas- 
ily be extended to non-orthogonal basis vectors. Also, 
the eigenvectors associated with the flrst excited state, 
second excited state, and so on, can easily be found 
by evolving several vectors simultaneously and includ- 
ing constraints that enforce their orthogonality to one 
another, i.e., (ff ■ (p^ = Q for a ^ (3. For example, this 
may be accomplished by adding a term Va/B to 
the Lagrangian, where VafS is a Lagrange multiplier that 
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FIG. 1: Comparison of convergence of the present inflation method with the Lanczos and Power methods. The computational 
time m (to calculate both the lowest eigenvalue and corresponding eigenvector) is the number of matrix-vector multiplications. 
Note that the original Lanczos method requires two matrix-vector multiplications per step if the eigenstate as well as the 
eigenvalue are to obtained at the end of the calculation without storing all intermediate vectors. The implicitly restarted 
Arnoldi method (MATLAB/ARPACK) behaves very similarly to Lanczos, and is not shown. Panels (a) and (b) show results 
for some test matrices taken from [111 and (c) for a random sparse matrix. The matrix used in (d) corresponds to a model of 
strongly correlated spin polarized fermions on a triangular lattice. 



can be shown to equal xfAi,jXj. Alternatively, one 
may begin by dynamically evolving a single random vec- 
tor using an appropriate window w, where w is chosen to 
include all eigenvalues of interest. After eigenpairs lying 
outside the window have been inflated away, one per- 
forms k X k diagonalization using k iterates of the initial 
vector as discussed above to obtain approximations for k 
lowest-lying eigenpairs. The k approximate vectors may 
then be dynamically evolved individually while enforc- 
ing the orthogonality constraints to obtain any desired 
accuracy for each eigenpair. 

We have applied the inflation method to a consider- 
able variety of large matrices, including diagonally dom- 
inated sparse, random sparse, and full, with diagonal el- 
ements chosen unevenly or evenly spaced, and eigenvec- 
tors weakly, moderately, or strongly mixed, up to a size 



of 10^ X 10^ (see Fig.[I]for a selection). The limiting fac- 
tor is not the dimension of the matrix per se, but rather 
the number Nn^ of its nonzero elements, both in terms 
of storage and time required for an iteration, which both 
scale linearly with Nm- These traits are also common 
to all methods employing matrix-vector multiplication. 
The advantage of the above described diagonalization in 
the subspace spanned by . . . (pk-i is most visible in 
the model of spin polarized electrons on a triangular lat- 
tice (see Fig. [ijd)). This is a numerically very difficult 
problem because it has a high density of low-lying exci- 
tations. The low-lying states can be efficiently separated 
from the ground state by the diagonalization, leading to 
considerably better convergence. 

In Fig. [2] we show that multiple eigenpairs can be ob- 
tained simultaneously by dynamically evolving a single 
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FIG. 2: The convergence of the inflation method for the low- 
est four eigenpairs of a test matrix [llj is compared with the 
imphcitly restarted Arnoldi method (as implemented in MAT- 
LAB/ ARPACK). Exact eigenvalues are indicated by hori- 
zontal lines. In the inflation method, we diagonalize in a 
6-dimensional basis after every 6 dynamical steps. In the 
Arnoldi calculation, we use a basis of size 12. In each case, 
the computational time m represents the number of matrix- 
vector multiplications. 




FIG. 3: An arbitrary starting vector finds the global minimum 
in the pseudopotential, falling off any saddles it encounters, 
associated with eigenvalues other than the lowest one. In this 
diagram the two minima are topologically equivalent, corre- 
sponding to the same eigenvector with opposite sign. Thus it 
does not matter which well the trajectory flnds. 
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initial vector. Here we perform 6x6 diagonalization after 
every 6 steps, and show convergence of the first 4 eigen- 
pairs. An arbitrary number of extremal eigenpairs can 
be obtained by diagonalizing in a subspace generated by 
inflating several eigenpairs under an orthogonality con- 
straint . 

We have investigated a few significant variations of the 
ideas presented here and have found thus far that the in- 
flation approach works best. For example, the idea of 
using a dynamical system to flnd eigenpairs suggests the 
following alternative idea: Instead of a Lagrangian con- 
straint, consider a "soft" normalization constraint, im- 
posed by adding a smooth quartic potential term, making 
the complete pseudopotential 

V ^^XiAi^jXj+ ni'^xj -l\ . (10) 

i-j \ i / 

Again damped dynamics is used. In the limit of large k, 
the quartic potential enforces unit norm of the solution 
vector, just as the Lagrangian constraint does. Now con- 
sider the nature of the extrema of V . With no loss of 
generality, we make an orthogonal transformation to the 
(unknown) normal coordinates ry^, 

v^Y.^..^" + -i^^"-^ ■ (11) 

The extrema of this potential are given (in the rj basis) 



— = 2e,r;, + 4k ryf - I = 0; alH . (12) 



Besides the trivial extremum at the origin (all 77^ = 0), 
we have up to 2N extrema corresponding to the TV possi- 
ble eigenstates, where a single r]i = ±-\/l — ei/2K, while 
rjj — for all j ^ i- For sufficiently strong constraint 
coefficient (k > eo/2), we easily check that all extrema 
are saddles with at least one unstable direction, with the 
exception of the extremum associated with the ground 
state (i.e., the global minimum). Thus, for almost all 
(i.e., all but a set of measure zero) initial conditions, the 
trajectory leads downhill to the true minimum. It does 
not dally long on any intermediate saddles it encounters 
because of their exponential instability (see Fig.|3|. The 
global minimum is doubled to two equivalent solutions, 
which are related by pref actor of —1 and correspond to 
the same eigenstate. It also does not matter if we adhere 
strictly to the rules of classical mechanics in getting to 
the minimum; any stepping method leading to the mini- 
mum will do. Note that the solution is independent of k 
and is exact, provided only that k > eo/2. 

The inflationary approach proposed here is quite com- 
petitive with standard methods. We have not yet in- 
vestigated pre-conditioning. Although the inflationary 
method is quite general, it seems especially well suited 
to physics and quantum chemistry problems, because of 
its dynamical underpinnings. 
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